class: inverse, center, middle

Plotting in R (Base graphics)


A pen and paper model

  • Once plot is produced, can only add more elements, cannot remove.
  • Makes it hard to update.
  • But faster than more complex plotting systems.

Scatter and linne Charts

First we’ll produce a very simple graph using the values in a numeric vector:

treatment <- c(0.02,1.8, 17.5, 55,75.7, 80)

Now, let’s customise this plot a little.

First we can plot treatment using points overlayed by a line. We control this with the type argument.

plot(treatment, type="o")

We can add additional labels to our plot’s axis and main/sub-title too.

We add a title with main argument and or a sub-title with the sub argument.

plot(treatment, main="My Plot", sub="a plot")


We can customise our x and y axis label with the xlab and ylab arguments respectively.

plot(treatment, xlab="Position", ylab="score")


We can control the orientation of labels on axis using las argument.

.pull-left[

plot(treatment, las=1)

] .pull-right[

plot(treatment, las=2)

]


We can control the size of points in our plot using the cex parameter.

.pull-left[

plot(treatment, cex=2)

] .pull-right[

plot(treatment, cex=0.5)

]


We can control the type of points in our plot using the pch parameter.

.pull-left[

plot(treatment, pch=1)

] .pull-right[

plot(treatment, pch=20)

]


Similarly when plotting a line we control size with lwd parameter.

.pull-left[

plot(treatment, type="l",lwd=10)

] .pull-right[

plot(treatment, type="l",lwd=0.5)

]


We can also control the type of line with lty parameter.

.pull-left[

plot(treatment, type="l",lty=1)

] .pull-right[

plot(treatment, type="l",lty=2)

]


An important parameter we can control is colour. We can control colour or lines or points using the col argument.

.pull-left[

plot(treatment, type="l",
     col="red")

] .pull-right[

plot(treatment, type="l",
     col="dodgerblue")

]


You can find an extensive list of R colours here.

R colour Chart


Review ?plot and ?par for complete list of options.


The plot function vector will accept two vectors to be plotted against each other.

control <- c(0, 20, 40, 60, 80,100)
plot(treatment,control)


But we often want mulitple lines in same plot. So if we want to plot scores for control and treatment against position we will need a new method.

We can add an addition line to our existing plot using the lines() function.

plot(treatment, type="o", col="blue")
lines(control, type="o", pch=22, lty=2, col="red")

Next let’s change the axes labels to match our data and add a legend. We’ll also compute the y-axis values using the range function so any changes to our data will be automatically reflected in our graph.

range() returns a vector containing the minimum and maximum of all the given arguments.

Calculate range from 0 to max value of data

g_range <- range(0, treatment, control)
g_range
## [1]   0 100

Plot treatment using y axis that ranges from 0 to max value in treatment or control vector. Turn off axes and annotations (axis labels) so we can specify them ourselves. We turn of axis and annotation plotting using axes=FALSE and ann=FALSE

plot(treatment, type="o", col="blue", 
     ylim=g_range,axes=FALSE, ann=FALSE)


We can now create our own X axis by using the axis() function. We specify the side argument for where to place axis, the at argument to specify where to put axis ticks and lab argument to specify labels for axis ticks.

axis(side=1, at=1:6, lab=c("Mon","Tue","Wed","Thu","Fri","Sat"))


We can make our y axis with horizontal labels that display ticks at every 20 marks in a similar way.

We specify our side and use rep() function to make axis tick postions for at argument.

axis(2, las=1, at=rep(0,g_range[2],by=20))


We can now add a box around our plot using the box() function.

box()


Now i can add my control data using lines argument.

lines(control, type="o", pch=22, lty=2, col="red")

Finally we may wish to add a legend to out plot. We can add a legend to current plot using the legend() function.

We need to specify where to place legend in plot, the names in legend to legend argument and any additional point/line type configuration we used, e.g blue/red.

legend("topleft",legend=c("treatment","control"),
       col=c("blue","red"), pch=21:22, lty=1:2);  

Bar Charts

Base graphics has a useful built in function for bar charts. The barplot() function. We can simply pass our numeric vector to this function to get our barchart.

barplot(treatment)


The barplot() function hasn’t added any labels by default. We can speciy our own however using the names.arg argument. names.arg is a vector of names to be plotted below each bar or group of bars.

barplot(treatment,
        names.arg=c("Mon","Tue","Wed","Thu","Fri","Sat"))


If my vector was named however, then my vectors names would be used for labels. We use names() function to add names to our vector then we replot.

names(treatment) <- c("Mon","Tue","Wed","Thu","Fri","Sat")
barplot(treatment)


Let’s now read the data from the example.txt data file.

Read values from tab-delimited example.txt

data <- read.table("data/example.txt", header=T, sep="\t")

Now we can plot data from a matrix with side-by-side barchart using the beside argument

barplot(as.matrix(data),beside=TRUE)

Histograms

Base graphics has a useful built in function for histograms too. The histogram() function. We can simply pass our numeric vector to this function to get our barchart.

hist(treatment)  


Similar cutomisation exists as for other plots.

hist(treatment, col="lightblue", ylim=c(0,10))

We can create more fine grained histogram by specify the number of required bins to the breaks argument.

hist(treatment, col="lightblue",breaks = 2)


Dot charts

Base graphics has a useful built in function for dotcharts too. The dotchart() function. We can simply pass our numeric vector to this function to get our barchart.

Here we use the function t to return the transpose of a matrix.

dotchart(t(data))   


Let’s make the dotchart a little more colourful:

Now we create a coloured dotchart for autos with smaller labels

dotchart(t(data), color=c("red","blue"),
         main="Dotchart", cex=0.8)  


Box plots

The final plot we will look at is a box and whisker plot.

Boxplots allow you to quickly review data distributions, showing the median and 1st/3rd quartile.


First lets read in the gene expression data

exprs <- read.delim("data/gene_data.txt",sep="\t",h=T,row.names = 1)
head(exprs)
##                    Untreated1 Untreated2  Treated1   Treated2
## ENSDARG00000093639  0.8616832  1.9311442 0.1041508 0.14055604
## ENSDARG00000094508  0.9857575  2.0256352 0.1549917 0.20301609
## ENSDARG00000095893  0.8498889  1.9875580 0.2317969 0.20925123
## ENSDARG00000095252  0.9242996  2.0857620 0.2562264 0.24669079
## ENSDARG00000078878  0.3571734  0.4653908 0.1167221 0.09710237
## ENSDARG00000079403  1.0604071  1.2581398 0.3884836 0.31567299

Now we can use the boxplot() function on our data.frame to get our boxplot

boxplot(exprs)


Perhaps it would look better on a log scale. We can add addition colours and labels as with other plots.

boxplot(log2(exprs),ylab="log2 Expression",
        col=c("red","red","blue","blue"))

Here, we will use different dataset with two columns each for treated and untreated samples.

data1 <- read.table("data/gene_data.txt", header=T, sep="\t")
head(data1)
##      ensembl_gene_id Untreated1 Untreated2  Treated1   Treated2
## 1 ENSDARG00000093639  0.8616832  1.9311442 0.1041508 0.14055604
## 2 ENSDARG00000094508  0.9857575  2.0256352 0.1549917 0.20301609
## 3 ENSDARG00000095893  0.8498889  1.9875580 0.2317969 0.20925123
## 4 ENSDARG00000095252  0.9242996  2.0857620 0.2562264 0.24669079
## 5 ENSDARG00000078878  0.3571734  0.4653908 0.1167221 0.09710237
## 6 ENSDARG00000079403  1.0604071  1.2581398 0.3884836 0.31567299

Plot histograms for different columns in the data frame separately. This is not very efficient. You could also do it more efficiently using for loop.

par(mfrow=c(2,2))
hist(data1$Untreated1)
hist(data1$Treated2)
hist(data1$Untreated2)
boxplot(data1$Treated1)

Saving in bitmap format

bmp(file = "control.bmp")
plot(control)
dev.off()

Saving in postscript format

postscript(file = "control.ps")
plot(control)
dev.off()

Exercise on base plotting can be found here


Answers for baseplotting can be found here